Nonlinear Spectral Transformations for Robust Speech Recognition

نویسندگان

  • Shajith Ikbal
  • Hynek Hermansky
  • Hervé Bourlard
چکیده

Recently, a nonlinear transformation of autocorrelation coefficients named Phase AutoCorrelation (PAC) coefficients has been considered for feature extraction [1]. PAC based features show improved robustness to additive noise as a result of two operations, performed during the computation of PAC, namely energy normalization and inverse cosine transformation. In spite of the improved robustness achieved for noisy speech, these two operations lead to some degradation in recognition performance for clean speech. In this paper, we try to alleviate this problem, first by introducing the energy information back into the PAC based features, and second by studying alternatives to inverse cosine function. Simply appending the frame energy as an additional coefficient in the PAC features has resulted in noticeable improvement in the performance for clean speech. Study of alternatives to inverse cosine transformation leads to a conclusion that linear transformation is the best for clean speech, while nonlinear functions help to improve robustness in noise.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

Robust Features for Speaker-Independent Speech Recognition Based on a Certain Class of Translation-Invariant Transformations

The spectral effects of vocal tract length (VTL) differences are one reason for the lower recognition rate of today’s speaker-independent automatic speech recognition (ASR) systems compared to speakerdependent ones. By using certain types of filter banks the VTL-related effects can be described by a translation in subband-index space. In this paper, nonlinear translation-invariant transformatio...

متن کامل

Time and frequency filtering of filter-bank energies for robust HMM speech recognition

Every speech recognition system requires a signal representation that parametrically models the temporal evolution of the speech spectral envelope. Current parameterizations involve, either explicitly or implicitly, a set of energies from frequency bands which are often distributed in a mel scale. The computation of those energies is performed in diverse ways, but it always includes smoothing o...

متن کامل

Constrained Spectrum Normalization for Robust Speech Recognition in Noise

This paper presents a new approach to robust speech recognition in noise based on spectral subtraction. A conventional spectral subtraction technique leads to nonlinear distortions of the normalized speech signals and resulting degradation of speech recognition accuracy. A new method is proposed to constrain spectral subtraction by imposing upper bounds on the estimates of the noise spectra. Tw...

متن کامل

Experiments with linear and nonlinear feature transformations in HMM based phone recognition

Feature extraction is the key element when aiming at robust speech recognition. In this work both linear and nonlinear data-driven feature transformations were applied to the logarithmic mel-spectral context feature vectors in the TIMIT phone recognition task. Transformations were based on Principal Component Analysis (PCA), Independent Component Analysis (ICA), Linear Discriminant Analysis (LD...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003